6 research outputs found

    GeoLayoutLM: Geometric Pre-training for Visual Information Extraction

    Full text link
    Visual information extraction (VIE) plays an important role in Document Intelligence. Generally, it is divided into two tasks: semantic entity recognition (SER) and relation extraction (RE). Recently, pre-trained models for documents have achieved substantial progress in VIE, particularly in SER. However, most of the existing models learn the geometric representation in an implicit way, which has been found insufficient for the RE task since geometric information is especially crucial for RE. Moreover, we reveal another factor that limits the performance of RE lies in the objective gap between the pre-training phase and the fine-tuning phase for RE. To tackle these issues, we propose in this paper a multi-modal framework, named GeoLayoutLM, for VIE. GeoLayoutLM explicitly models the geometric relations in pre-training, which we call geometric pre-training. Geometric pre-training is achieved by three specially designed geometry-related pre-training tasks. Additionally, novel relation heads, which are pre-trained by the geometric pre-training tasks and fine-tuned for RE, are elaborately designed to enrich and enhance the feature representation. According to extensive experiments on standard VIE benchmarks, GeoLayoutLM achieves highly competitive scores in the SER task and significantly outperforms the previous state-of-the-arts for RE (\eg, the F1 score of RE on FUNSD is boosted from 80.35\% to 89.45\%). The code and models are publicly available at https://github.com/AlibabaResearch/AdvancedLiterateMachinery/tree/main/DocumentUnderstanding/GeoLayoutLMComment: CVPR 2023 Highligh

    Vision Grid Transformer for Document Layout Analysis

    Full text link
    Document pre-trained models and grid-based models have proven to be very effective on various tasks in Document AI. However, for the document layout analysis (DLA) task, existing document pre-trained models, even those pre-trained in a multi-modal fashion, usually rely on either textual features or visual features. Grid-based models for DLA are multi-modality but largely neglect the effect of pre-training. To fully leverage multi-modal information and exploit pre-training techniques to learn better representation for DLA, in this paper, we present VGT, a two-stream Vision Grid Transformer, in which Grid Transformer (GiT) is proposed and pre-trained for 2D token-level and segment-level semantic understanding. Furthermore, a new dataset named D4^4LA, which is so far the most diverse and detailed manually-annotated benchmark for document layout analysis, is curated and released. Experiment results have illustrated that the proposed VGT model achieves new state-of-the-art results on DLA tasks, e.g. PubLayNet (95.7%95.7\%\rightarrow96.2%96.2\%), DocBank (79.6%79.6\%\rightarrow84.1%84.1\%), and D4^4LA (67.7%67.7\%\rightarrow68.8%68.8\%). The code and models as well as the D4^4LA dataset will be made publicly available ~\url{https://github.com/AlibabaResearch/AdvancedLiterateMachinery}.Comment: Accepted by ICCV202

    A coverless information hiding algorithm based on gradient matrix

    No full text

    Joint Copying and Restricted Generation for Paraphrase

    No full text
    Many natural language generation tasks, such as abstractive summarization and text simplification, are paraphrase-orientated. In these tasks, copying and rewriting are two main writing modes. Most previous sequence-to-sequence (Seq2Seq) models use a single decoder and neglect this fact. In this paper, we develop a novel Seq2Seq model to fuse a copying decoder and a restricted generative decoder. The copying decoder finds the position to be copied based on a typical attention model. The generative decoder produces words limited in the source-specific vocabulary. To combine the two decoders and determine the final output, we develop a predictor to predict the mode of copying or rewriting. This predictor can be guided by the actual writing mode in the training data. We conduct extensive experiments on two different paraphrase datasets. The result shows that our model outperforms the state-of-the-art approaches in terms of both informativeness and language quality

    Aerodynamic modification and optimization of intermediate pressure compressor in marine intercooled recuperated gas turbine

    No full text
    The intercooling recuperated cycle (ICR) is commonly employed in marine gas turbines to enhance thermal efficiency. However, the addition of an intercooler may lead to the increased dimension and structural complexity of marine ICR gas turbines. To address this issue, we propose an improved configuration of the intermediate pressure compressor – an axial-centrifugal combined compressor (ACC) with high inlet hub-tip ratio and high flow rate. The aerodynamic performance of the ACC at multi-operating points is optimized via an improved free-form deformation method for the parametric modeling of the flow paths and blades of turbomachinery. The result indicates that the number of stages decreases from 6 to 3 and the axial length is reduced by 38.3% after modification. The adiabatic efficiency of the optimized ACC at the design and low-speed operating points is improved by 1.18% and 6.48% respectively. Additionally, the ACC redesigned scheme can reduce the axial length, maximize the flow path space utilization, enhance the stage load capacity, and significantly improve the low-speed performance. This provides a reference for developing advanced marine ICR gas turbines with high power and low fuel consumption

    Hypoxic mesenchymal stem cell-derived exosomes promote the survival of skin flaps after ischaemia–reperfusion injury via mTOR/ULK1/FUNDC1 pathways

    No full text
    Abstract Flap necrosis, the most prevalent postoperative complication of reconstructive surgery, is significantly associated with ischaemia–reperfusion injury. Recent research indicates that exosomes derived from bone marrow mesenchymal stem cells (BMSCs) hold potential therapeutic applications in several diseases. Traditionally, BMSCs are cultured under normoxic conditions, a setting that diverges from their physiological hypoxic environment in vivo. Consequently, we propose a method involving the hypoxic preconditioning of BMSCs, aimed at exploring the function and the specific mechanisms of their exosomes in ischaemia–reperfusion skin flaps. This study constructed a 3 × 6 cm2 caudal superficial epigastric skin flap model and subjected it to ischaemic conditions for 6 h. Our findings reveal that exosomes from hypoxia-pretreated BMSCs significantly promoted flap survival, decrease MCP-1, IL-1β, and IL-6 levels in ischaemia–reperfusion injured flap, and reduce oxidative stress injury and apoptosis. Moreover, results indicated that Hypo-Exo provides protection to vascular endothelial cells from ischaemia–reperfusion injury both in vivo and in vitro. Through high-throughput sequencing and bioinformatics analysis, we further compared the differential miRNA expression profiles between Hypo-Exo and normoxic exosomes. Results display the enrichment of several pathways, including autophagy and mTOR. We have also elucidated a mechanism wherein Hypo-Exo promotes the survival of ischaemia–reperfusion injured flaps. This mechanism involves carrying large amounts of miR-421-3p, which target and regulate mTOR, thereby upregulating the expression of phosphorylated ULK1 and FUNDC1, and subsequently further activating autophagy. In summary, hypoxic preconditioning constitutes an effective and promising method for optimizing the therapeutic effects of BMSC-derived exosomes in the treatment of flap ischaemia–reperfusion injury
    corecore